Nested Calls

One option to handle subroutines is to store the return address in a special register in the CPU, and when you need to return just copy that register into the instruction pointer, but then you can't handle nested calls, and most reasonably complex programs use multiple subroutines that could call each other; as soon as the second call happens, the return address for the first call would be overwritten.

To overcome this problem, the return addresses should be pushed to a stack at the top of the running program's memory, which is treated separately and is accessed directly through CPU registers

Machine Level Stacks

Stacks have two operations: Push, and Pop
The Intel x86 architecture uses main memory for the stack but accesses it via a register
ESP - stack pointer, always points to the memory address of the top item on the stack

Remember: the stack grows downwards in memory. As the program code is at the bottom of the memory and goes up, from position 0 to 1 to 2, etc, the stack of the program starts from the top of the program's allocated memory and works it's way down

The push instruction:

Call and Return

The call instruction:

This allows for nested subroutines with correct, maintained return addresses

Manipulating the Stack Pointer

ESP can be changed directly from the code
eg. to take 8 bytes off the stack: add esp, 8
Note that you can inspect any data on the stack as an offset to ESP

Subroutine Parameters

Pass By Value

A simple subroutine like this uses pass by value (values copied into registers)

; SUB bigger
bigger: cmp eax, ebx
		jl second
		ret
second: mov eax, ebx
		ret
; END bigger
...
mov eax, num1
mov ebx, num2
call bigger
mov max, eax

^ to execute max = maximum(num1, num2)
This depends on the caller and callee agreeing on which registers to use for the parameters and return value

Pass By Reference

A function that for example swaps two variables needs the memory locations, not just the values, so memory addresses are needed as parameters (pass by reference)

; SUB swap
swap:   mov ecx, [eax]
		mov edx, [ebx]
		mov [ebx], ecx
		mov [eax], edx
		ret
; END swap
...
lea eax, num1
lea ebx, num2
call swap

The caller and the callee still need to agree on which registers to use

(Note)
Intel x86 has an instruction that can swap values inside two registers
xchg eax, ebx
One operand can be a memory label but this is slow due to locking (a concurrency issue)

Stacking Parameters

If many parameters are needed, or registers are already in use for other data, you can stack the parameters:

For example, for an rectangle area function:
Callee cleans the stack (stdcall)

; SUB area
area:   pop ebx ; the return address
		pop edx
		pop eax
		mul edx
		push ebx ; the return address
		ret
; END area
...
push width
push height
call area
mov result, eax
...

Caller cleans the stack (cdecl)

; SUB area
area:   mov eax, [esp+4]
		mult [esp+8]
		ret
; END area
...
push width
push height
call area
add esp, 8
mov result, eax
...

Calling Conventions

Caller and Callee must agree on a calling convention

Immediately after the call instruction is executed:

If the return address is forgotten about, the program will not work as intended

Intel x86 Calling Conventions

Intel x86 architecture defines four calling conventions

fastcall and thiscall conventions are 'faster' if there are less parameters, but they pollute the registers that may be better used for something else

C library routines expect the programmer to use the cdecl convention.

I/O

I/O is hard in pure assembly, so instead:

External subroutines (C library code) can be called in the same way as assembly subroutines, but we must follow cdecl

printf - Send formatted output to the console
scanf - Wait for input from the console

Program Output

To output things, printf is used:

For example, to output a message:

#include <stdio.h>
#include <stdlib.h>

int main (void) {
	char msg[] = "Hello World\n";
	_asm {
		lea eax, msg
		push eax
		call printf
		pop eax
	}
	return 0;
}

Corrupted Registers

We don't know exactly what happens inside any external subroutines, but it will probably make use of registers which would overwrite them, so any register values that are important to the code must be saved before (use the stack)

Using the Stack

Save things onto the stack and then restore them after the external call returns
For example, for a program to output the string 10 times, the loop counter (ECX) must be maintained:

		mov ecx, 10
floop:  push ecx        ;counter onto the stack
		lea eax, msg
		push eax
		call printf
		pop eax
		pop ecx     ;bring the counter back
		loop floop

Doing this ensures that the code will work as intended, even if the external subroutine uses ECX, as we save its value and restore it after the external subroutine terminates, but before it is used by the loop instruction

Outputting Values

Format Specifiers

Parameters must match the specifiers in the string

For example:

char msg[] = "The number is %d\n";
int num = 7;
_asm {
	push num      // Parameters pushed in reverse order (cdecl)
	lea eax, msg
	push eax
	call printf
	add esp, 8
}

Adding to ESP is a quick way to clean up multiple parameters at once

Program Input

To input data, scanf is used
It takes two parameters:

Following the cdecl convention:

Strings can be taken as input if care is taken to reserve enough memory

For example:

char fmt[] = "%d";
int num;
_asm {
	lea eax, num // Remember the address is needed, not the value
	push eax     // Params pushed in reverse order
	lea eax, fmt
	push eax
	call scanf
	add esp, 8
}

Stacking Local Variables

In high level languages, subroutines can have local (internal) variables that only exist while the subroutine is active. This can be done in assembly using the stack

Stack Frames

Each time a subroutine is called, a new stack frame is created on the stack
This holds data that is needed by the subroutine:

Building the Stack Frame

ESP always points to the top of the stack
EBP initially points to the base of the stack
When a subroutine is called:

Nested Calls and Stack Frames